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Abstract 

Newton  methods  for  large-scale  minimization  subject  to  linear  equality  con¬ 
straints  are  discussed.  For  large-scale  problems,  it  may  be  prohibitively  expen¬ 
sive  to  reduce  the  problem  to  an  unconstrained  problem  in  the  null  space  of 
the  constraint  matrix.  We  investigate  computational  schemes  that  enable  the 
computation  of  descent  directions  and  directions  of  negative  curvature  without 
the  need  to  know  the  null-space  matrix.  The  schemes  are  based  on  factorizing 
a  sparse  symmetric  indefinite  matrix.  Three  different  methods  are  proposed 
for  computing  the  desired  directions.  Convergence  properties  for  the  different 
methods  are  established.  .  . 

(  !  y—' 

Keywords:  Linear  equality-constrained  minimization,  negative  curvature, 
modified  Newton  method,  symmetric  indefinite  factorization,  large-scale  mini¬ 
mization,  linesearch  method. 


1.  Introduction 

We  consider  methods  for  finding  a  local  minimizer  of  the  problem 


K  -*>v. 


minimize  f(x) 

M  '  (1.1) 

subject  to  Ax  =  b , 

where  A  is  an  m  x  n  matrix  and  /  £  C2.  We  are  interested  in  the  case  when  n  and 
possibly  m  are  large  and  when  second  derivatives  of  /  are  available.  It  is  assumed 
that  A  is  a  sparse  matrix  of  full  row  rank.  We  also  assume  that  an  initial  feasible 
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point  xo  is  known  and  that  the  level  set  S(xo)  =  {x  :  /(x)  <  /(a. o)»  Ax  =  6}  is 
compact. 

If  (1.1)  is  solved  by  Newton’s  method,  the  step  p  from  the  current  iterate  x  may 
be  defined  a s  p  =  Zpz,  where 

ZTHZp2  =  -ZTg.  (1.2) 

The  columns  of  the  matrix  Z  form  a  basis  for  the  null  space  of  A,  H  denotes  the 
Hessian  matrix  V2/(x)  and  g  denotes  the  gradient  V/(x).  We  shall  refer  to  the 
matrix  ZTH  Z  as  the  reduced  Hessian  and  the  vector  ZTg  as  the  reduced  gradient. 
A  mathematically  equivalent  way  to  obtain  p  is  to  solve  the  equation 

We  shall  refer  to  the  matrix  on  the  left-hand  side  of  (1.3)  as  the  KKT-matrix  and 
denote  it  by  A-. 

Given  p,  the  next  iterate  is  obtained  as  x  +  p.  If  Newton’s  method  converges 
to  a  solution  of  (1.1)  and  the  reduced  Hessian  is  positive  definite  and  Lipschitz 
continuous,  convergence  occurs  at  a  quadratic  rate.  However,  Newton’s  method 
may  not  converge  from  every  starting  point,  and  in  the  neighborhood  of  a  saddle- 
point  or  a  local  maximizer,  a  sequence  of  iterates  generated  by  Newton’s  method 
may  converge  to  such  a  point.  Consequently,  if  a  method  that  generates  a  sequence 
of  improving  estimates  is  required,  some  modification  is  needed.  If  ZTHZ  is  positive 
definite,  p  is  a  feasible  descent  direction,  i.e.,  a  direction  p  such  that  pTg  <  0  and 
Ap  =  0.  In  this  case,  the  objective  function  value  at  x  +  p  may  not  be  reduced 
but  a  step  along  p  may  be  found  that  yields  the  next  improved  iterate.  If  ZTHZ  is 
not  positive  definite,  the  Newton  direction  may  not  be  a  direction  along  which  the 
objective  function  decreases. 

Modifications  of  Newton’s  method  suitable  for  iterates  where  the  Newton  step 
does  not  yield  a  sufficient  decrease  of  the  objective  function  exist.  The  methods 
can  be  divided  into  two  types,  trust-region  methods  and  linesearch  methods.  See 
More  and  Sorensen  [MS84]  and  Dennis  and  Schnabel  [DS89]  for  a  discussions  of 
these  different  types  of  method.  We  focus  on  linesearch  methods  in  this  paper 
and  in  order  to  simplify  the  notation,  we  shall  refer  to  modifications  of  Newton’s 
method  of  linesearch  type  as  modified  Newton  methods.  Such  methods  involve  the 
computation  of  a  feasible  descent  direct' -m  s  and,  if  the  reduced  Hessian  has  at  least 
one  negative  eigenvalue,  a  feasible  d  rf-.cUjn  of  negative  curvature  d.  A  direction  d  is 
said  to  be  a  feasible  direction  of  negati  urvature  if  <ffiH d  <  0  and  Ad  =  0.  If  s  is  a 
direction  of  sufficient  descent  and  d  is  a  direction  of  sufficient  negative  curvature,  the 
convergence  of  a  modified  Newton  method  for  linear  equality-constrained  problems 
follows  from  known  results  on  modified  Newton  methods,  see  for  example  Fiacco 
and  McCormick  [FM68],  Gill  and  Murray  [GM74],  McCormick  [McC77],  Fletcher 
and  Freeman  [FF77],  Mukai  and  Polak  [MP78],  Kaniel  and  Dax  [KD79],  More  and 
Sorensen  [MS79],  Goldfarb  [Gol80]  and  Forsgren  et  al.  [FGM89b].  We  require 
the  computed  directions  s  and  d  to  be  feasible,  i.e.,  As  =  0  and  Ad  =  0.  If  the 
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reduced  Hessian  is  known  and  has  at  least  one  negative  eigenvalue,  a  direction  v  such 
that  vtZtHZv  <  0  may  be  obtained  using  techniques  for  unconstrained  problems. 
Consequently,  d  =  Zv  is  a  feasible  direction  of  negative  curvature.  Similarly,  a 
positive-definite  modification  of  ZTHZ  enables  the  computation  of  a  feasible  descent 
direction.  Feasibility  is  therefore  not  an  issue  if  the  reduced  Hessian  is  known, 
whereas  it  is  not  immediately  apparent  how  to  satisfy  .4s  =  0  and  Ad  =  0  when 
utilizing  K. 

Although  equations  (1.2)  and  (1.3)  are  mathematically  equivalent,  they  differ 
in  the  amount  of  work  required  to  obtain  the  search  direction  p.  To  obtain  p  from 

(1.2) ,  the  reduced  Hessian  is  required.  A  matrix  Z  may  be  obtained  by  forming 
the  Lf/-factorization  of  A,  see  Gill  et  al.  [GMSW87a].  Even  if  Z  is  aparse,  the 
matrix  ZTHZ  may  be  completely  dense,  and  the  amount  of  work  required  to  form 
ZtHZ  explicitly  may  be  prohibitive,  see  Gill  et  al.  [GMSW85].  To  obtain  p  from 

(1.3) ,  the  KKT-matrix  is  required.  If  H  and  A  are  sparse,  they  yield  a  sparse  K . 
Consequently,  if  equations  involving  K  are  solved,  it  is  possible  to  take  advantage 
of  the  sparsity  of  the  problem. 

The  goal  of  the  methods  described  in  this  paper  is  to  compute  search  directions 
directly  from  equations  involving  the  KKT-matrix  K  or  a  modified  KKT-matrix.  We 
prefer  to  use  the  identical  method  to  compute  the  Newton  direction  from  (1.3)  that 
we  would  use  if  it  was  known  in  advance  that  ZTH Z  was  positive  definite.  Also,  a 
method  for  which  there  exists  an  efficient  implementation  for  a  large  sparse  matrix  K 
is  desired.  Accordingly,  we  consider  the  ZZ?ZT-factorization,  described  in  Section  3. 
Such  a  factorization  computes  1ITK II  =  LBLT,  where  77  is  a  permutation  matrix,  L 
is  a  unit  lower- triangular  matrix  and  B  is  a  symmetric  block-diagonal  matrix  whose 
diagonal  blocks  are  of  size  1  x  1  or  2  x  2.  The  permutations  are  performed  in  order  to 
obtain  a  matrix  L  that  is  sparse  and  well-conditioned.  For  unconstrained  problems, 
the  matrices  K  and  H  are  identical.  In  this  case,  More  and  Sorensen  [MS79]  have 
shown  how  to  compute  a  descent  direction  and  a  direction  of  negative  curvature, 
whenever  they  exist,  from  L  and  B.  We  describe  a  pivoting  strategy  in  the  LBLT- 
factorization  of  K,  so  that  the  ability  to  compute  a  feasible  descent  direction  and 
a  feasible  direction  of  negative  curvature  is  achieved  by  a  single  factorization  of 
K,  analogous  to  the  method  of  More  and  Sorensen.  The  computed  directions  are 
shown  to  satisfy  conditions  needed  to  apply  known  convergence  results  for  modified 
Newton  methods,  which  state  that  limit  points  of  a  generated  sequence  of  iterates 
satisfy  the  second-order  necessary  optimality  conditions. 

The  abovementioned  pivoting  strategy  sufficient  for  computing  both  a  descent 
direction  and  a  direction  of  negative  curvature  is  then  shown  to  be  necessary  in  order 
to  guarantee  the  ability  to  obtain  the  directions  from  a  single  factorization  of  K. 
Two  other  methods  for  computing  search  directions  are  described  that  do  not  require 
this  pivoting  strategy.  One  method  always  generates  a  descent  direction,  whereas 
the  other  method  generates  a  search  direction  that  is  either  a  descent  direction 
and/or  a  direction  of  negative  curvature.  Applying  known  convergence  results  for 
modified  Newton  methods,  limit  points  of  a  generated  sequence  of  iterates  using 
these  methods  satisfy  the  first-order  necessary  optimality  conditions.  The  main 
purpose  of  introducing  two  additional  methods  is  that  the  computation  of  the  search 
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direction  is  likely  to  be  cheaper  than  in  the  method  motivated  by  convergence  to  a 
“second-order”  point.  The  intent  is  that  the  three  methods  may  be  mixed  to  obtain 
a  more  efficient  method. 

2.  Basics 

2.1.  Terminology 

Throughout  the  paper,  the  Hessian  V2/(a :)  and  gradient  V/(x)  are  evaluated  at 
points  x  €  S(x0).  Given  an  arbitrary  point  in  S(x o),  H  denotes  the  Hessian  and  g 
the  gradient  at  such  a  point.  The  set  of  KKT-matrices  for  which  H  is  evaluated  at 
some  point  in  S(x o)  is  denoted  by  K.  All  KKT-matrices  considered  belong  to  K,.  For 
an  iterative  sequence  {xjfc}£L0  a  subscript  k  is  included,  so  that  Hk  =  V2/(x*)  and 
gk  =  V/(xfc).  Also,  for  vectors  and  matrices,  the  norm  used  is  always  the  Euclidean 
vector  norm  and  the  corresponding  induced  matrix  norm.  The  vector  e,  denotes 
the  t-th  unit  vector  of  the  appropriate  dimension.  In  a  number  of  lemmas  matrices 
of  zero  dimension  may  arise.  In  such  circumstances,  there  is  no  loss  of  generality  if 
we  make  the  assumption  that  matrices  of  zero  dimension  have  unit  eigenvalues  and 
norm  zero. 

2.2.  The  inertia  of  a  matrix 

Let  M  be  any  symmetric  matrix.  We  denote  by  ip(M),  i„(M)  and  tz(Af)  respectively 
the  (nonnegative)  numbers  of  positive,  negative  and  zero  eigenvalues  of  M.  The 
inertia  of  M — denoted  by  In(M) — is  the  associated  integer  triple  (tp, *„>*»)•  The 
following  lemma  states  an  important  relationship  between  the  inertia  of  the  KKT- 
matrix  K  and  the  reduced  Hessian  ZTHZ. 

Lemma  2.1.  // rank(A)  =  r,  and  Z  is  a  matrix  whose  columns  form  a  basis  for 
the  null  space  of  A,  then  In(K )  =  In(ZTH Z)  +  (r,r,m  -  r). 

Proof.  See  Gould  [Gou85,  Lemma  3.4]  | 

This  lemma  implies  that  K  is  singular  only  if  ZTHZ  is  singular  or  A  is  rank 
deficient.  In  this  paper,  we  assume  that  A  has  full  row  rank,  so  that  singularity  of 
K  always  means  singularity  of  ZTHZ.  The  following  lemma  shows  that  if  A  has 
full  row  rank,  there  is  a  uniform  relationship  between  the  nonsingularity  of  K  and 
ZtHZ. 

Lemma  2.2.  Assume  that  rank(A)  =  m,  and  let  Z  denote  a  matrix  whose  columns 
form  a  basis  for  the  null  space  of  A.  For  a  given  positive  constant  ci,  there  exists 
a  positive  constant  c2  such  that  for  any  KKT-matrix  K  €  1C  having  its  smallest 
singular  value  greater  than  ci ,  the  smallest  singular  value  of  ZTH  Z  is  greater  than 
c2. 

Proof.  If  A  has  full  row  rank,  Lemma  2.1  implies  that  K  is  nonsingular  if  and  only  if 
ZtHZ  is  nonsingular.  For  a  symmetric  matrix,  the  singular  values  are  the  absolute 
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values  of  the  eigenvalues.  The  norm  of  K  is  bounded,  since  S'(xo)  is  compact.  The 
result  follows  by  observing  that  the  eigenvalues  of  K  and  ZTHZ  vary  continuously 
with  H .  | 


3.  The  £5ZT-factorization 
3.1.  The  factorization 

An  efficient  method  for  solving  equations  involving  symmetric  indefinite  matrices  is 
to  use  the  factorization 

nTKn  =  LBLt ,  (3.1) 

where  77  is  a  permutation  matrix  that  represents  column  interchanges  in  K ,  L  is  a 
unit  lower-triangular  matrix,  B  is  a  symmetric  block- diagonal  matrix  whose  blocks 
are  of  size  1  x  1  or  2  x  2.  The  2x2  blocks  are  always  nonsingular  with  one  positive 
and  one  negative  eigenvalue.  We  shall  refer  to  this  factorization  as  the  LBLT- 
factorization.  Various  algorithms  for  computing  the  factors  have  been  proposed,  see 
Bunch  and  Parlett  [BP71]  and  Bunch  and  Kaufman  [BK77]. 

Forming  the  £7?7,r-factorization  may  be  viewed  as  a  step-wise  procedure  that 
repeatedly  computes  Schur  complements  of  decreasing  dimension.  If  K  is  partitioned 
as 


and  T  is  nonsingular,  the  Schur  complement  of  T  in  K  will  be  denoted  by  K/T, 
and  is  defined  as 

K/T  =  G  NT~1Nt. 

The  matrix  I(/T  will  be  referred  to  as  “the”  Schur  complement  when  the  matrix 
T  is  clear  from  the  context.  By  convention,  we  define  the  Schur  complement  as  K 
when  T  has  dimension  zero.  Given  the  partition  from  (3.2),  the  identity 


holds.  In  the  first  step  of  the  factorization,  the  matrix  71  is  a  1  x  1  or  2  x  2  principal 
submatrix  of  K.  Symmetric  row  and  column  interchanges  may  be  necessary  in  order 
to  obtain  a  T  that  is  suitable  as  pivot.  The  elimination  step  yields  one  or  two  rows  of 
Lt  as  (/  T~1Nt )  and  a  diagonal  block  of  B  of  size  1  x  1  or  2  x  2  from  T.  In  the  next 
step,  the  process  is  repeated  for  the  Schur  complement  K/T.  Eventually  the  Schur 
complement  has  dimension  zero,  and  the  algorithm  terminates  with  permutation 
matrix  77,  unit  lower- triangular  matrix  L  and  block-diagonal  matrix  B.  Algorithms 
differ  in  the  way  the  pivot  is  chosen.  In  this  paper  we  do  not  elaborate  on  which 
algorithm  to  use,  except  to  assume  the  algorithm  yields  a  bounded  ||£||.  We  shall 
refer  to  this  algorithm  as  an  algorithm  that  performs  a  regular  £7?7/T-factorization. 
Since  £  is  a  unit  lower-triangular  matrix,  if  the  norm  of  L  is  bounded  then  the  norm 
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of  L~l  is  also  bounded.  Consequently,  it  follows  from  (3.1)  that  K  is  arbitrarily 
close  to  a  singular  matrix  if  and  only  if  B  is  arbitrarily  close  to  a  singular  matrix. 
In  our  applications,  the  inertia  of  K  is  required.  This  inertia  is  readily  available  if 
B  is  known,  since  Sylvester’s  law  of  inertia  yields  In(K)  =  In(B ),  see  [GV89,  page 
416]. 


3.2.  Computational  considerations 

When  K  is  sparse,  the  iHZT-factorization  of  K  may  be  carried  out  in  two  steps. 
First,  in  the  analyze  phase ,  a  symbolic  factorization  based  on  the  nonzero  elements 
in  K  is  performed.  This  symbolic  factorization  yields  an  ordering  of  the  rows  and 
columns  that  attempts  to  minimize  the  number  of  nonzeros  in  L.  Then,  in  the 
numerical  phase,  the  factors  are  computed,  attempting  to  maintain  the  ordering 
given  by  the  analyze  phase.  However,  for  numerical  reasons,  it  may  be  necessary  to 
perform  additional  permutations  in  the  numerical  phase.  Consequently,  the  permu¬ 
tation  matrix  H  in  (3.1)  is  the  combination  of  the  ordering  from  the  analyze  phase 
and  the  additional  permutations  from  the  numerical  phase.  In  a  modified  Newton 
method,  where  a  sequence  of  /['-matrices  is  factorized,  the  analyze  phase  need  only 
be  performed  once,  since  the  positions  of  the  nonzero  elements  in  the  Hessian  remain 
the  same.  A  robust  and  efficient  routine  that  performs  such  a  two-step  factorization 
is  the  Harwell  routine  MA27,  see  Duff  and  Reid  [DR82]  fDR83].  (MA27  may  use 
2x2  pivots  that  are  not  indefinite.  To  fit  the  discussion  here,  such  a  pivot  may  be 
viewed  equivalent  to  two  consecutive  lxl  pivots.) 

3.3.  Definition  of  pivot  types 

The  matrix  B  consists  of  diagonal  blocks  of  size  1  x  1  or  2  x  2,  where  the  2  x  2- blocks 
are  indefinite.  These  diagonal  blocks  are  the  pivots  defined  by  the  factorization 
algorithm.  It  is  of  importance  to  distinguish  between  different  types  of  block. 

A  1  x  1-block  is  defined  to  be  of  type  H +  if  it  is  positive,  and  if  the  element  in 
the  same  position  in  nTKII  is  a  diagonal  element  of  H.  Similarly,  1  x  1-blocks  are 
defined  to  be  of  type  H~  and  H°  if  they  satisfy  the  same  position  requirements  as 
the  blocks  of  type  H+,  but  are  negative  or  zero,  respectively.  We  denote  by  n£,  n~H 
and  n°H  respectively,  the  number  of  such  1  x  1-blocks. 

A  1  x  1-block  is  defined  to  be  of  type  A+  if  it  is  positive,  and  if  the  element  in 
the  same  position  in  nTKII  is  a  diagonal  element  of  the  zero-part  of  K.  Similarly, 
1  x  1-blocks  are  defined  to  be  of  type  A~  and  A0  if  they  satisfy  the  same  position 
requirements  as  the  blocks  of  type  A +,  but  are  negative  or  zero,  respectively.  We 
denote  by  n\,  n~  and  n®  respectively,  the  number  of  such  l  x  1-blocks. 

A  2  x  2-block  is  defined  to  be  of  type  HH  if  the  elements  in  nTKII  with  the  same 
position  are  elements  of  H .  We  denote  by  nHH  the  number  of  such  2  x  2- blocks. 

A  2  x  2-block  is  defined  to  be  of  type  A4  if  the  elements  in  nTKII  with  the 
same  position  are  elements  of  the  zero- part  of  K.  We  denote  by  nAA  the  number  of 
such  2  x  2-blocks. 

A  2  x  2-block  is  defined  to  be  of  type  HA  if  the  elements  in  IITK II  with  the 
same  position  consist  of  one  diagonal  element  from  H,  one  diagonal  element  from 
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the  zero-part  of  K  and  two  elements  from  A. 

3.4.  Applications  in  unconstrained  minimization 

For  unconstrained  problems,  More  and  Sorensen  [MS79]  have  shown  how  to  com¬ 
pute  a  descent  direction  s  and  a  direction  of  negative  curvature  d,  whenever  such 
directions  exist,  using  the  LUT^-factorization  of  the  Hessian  H ,  given  by  IITH  Jl  = 
LBLt.  The  descent  direction  s  satisfies  the  equation  II LBLTIlTs  =  -g,  where  B 
is  a  positive-definite  modification  of  B.  If  H  is  sufficiently  positive  definite,  then 
B  =  B  and  s  is  the  Newton  search  direction.  If  H  is  not  positive  definite,  B  is 
obtained  a s  B  =  B  +  D,  where  D  is  a  block-diagonal  matrix  with  the  same  block 
structure  as  B,  and  D  has  rank  equal  to  the  number  of  nonpositive  eigenvalues  of 
H .  If  H  has  at  least  one  negative  eigenvalue,  the  direction  of  negative  curvature, 
d,  satisfies  the  equation  LTIITd  =  ±\/-^min (B)v,  where  v  is  an  eigenvector  of  unit 
norm  corresponding  to  \m-,„(B),  the  smallest  eigenvalue  of  B.  The  sign  is  chosen  so 
that  gTd  <  0.  For  details,  see  More  and  Sorensen  [MS79]. 

Similarly,  in  the  linear  equality-constrained  case,  a  feasible  descent  direction, 
(assuming  such  a  direction  exists),  may  be  obtained  by  solving 


where  H  is  a  modification  of  H  such  that  ZTHZ  is  positive  definite.  Utilizing  the 
idea  of  More  and  Sorensen  [MS79]  from  the  unconstrained  case,  we  may  try  to  use 
the  LBL ^-factorization  of  K  to  modify  blocks  of  B  of  type  H~ ,  H°  and  HH,  yielding 
a  matrix  B  such  that  LBLT  has  n  positive  eigenvalues  and  m  negative  eigenvalues. 
The  following  lemma  shows  that  there  is  a  sufficient  number  of  blocks  of  type  H~, 
H°  and  HH  to  create  such  a  matrix  B. 

Lemma  3.1.  If  In(K)  =  (tp,in,*z),  where  in  +  iz  -  m  >  0,  then  +  n°H  +  nHH  > 

in  +  iz  -  m. 


Proof.  Assume  the  contrary,  that  n~  +  n°H  +  nHH  <  in  +  iz  -  m.  The  dimensions 


of  H  and  A  imply 

nH  +  nH  +  n°H  +  2nHH  +  nHA  =  n  (3.5a) 

n\  +  nA  +  n°A  +  nHA  +  2  nAA  =  m.  (3.5b) 

If  In(K)  =  (tp, »„,»*),  we  get 

««  +  nHH  -)-  nHA  +  n*  +  nAA  =  ip  (3.6a) 

n~H  +  nHH  +  nHA  +  n~  +  nAA  =  i„  (3.6b) 

n°„  +  n°A  =  iz.  (3.6c) 


Adding  (3.6b)  and  (3.6c)  we  obtain  nHA  +  nA  +  nAA  +  n°A  >  m,  contradicting  (3.5b). 
Thus  nj,  +  n°H  +  <  tn  +  iz  -  m  cannot  hold.  | 
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However,  although  there  are  sufficiently  many  block  elements  of  type  H~,  H°  and 
HH ,  if  no  additional  conditions  are  imposed  on  77,  the  ordering  may  be  insufficient 
to  guarantee  that  only  the  H- part  of  K  is  altered  when  a  block  element  of  type 
H~,  H°  or  HH  is  modified.  Consequently,  if  an  equation  involving  LBLT  is  solved, 
there  is  no  straightforward  analogous  way  of  obtaining  a  descent  direction  s  such 
that  As  =  0.  As  an  example,  consider 


and  A  =  (  2  1  )  , 


so  that 


0 

-2 

1 


If  II  is  the  identity  matrix,  we  obtain 


(  1 

0 

0  > 

(  2 

0 

0  > 

L  = 

0 

1 

0 

and  B  = 

0 

-2 

0 

l  1 

1 

2 

1 J 

1° 

0 

-§  > 

2  N 
1 

0/ 


The  matrix  B  has  n^  =  nj,  =  n~  =  1.  Analogous  to  the  unconstrained  case,  we 
make  the  second  diagonal  element  of  B  positive  by  adding  a  positive  number  d22- 
However,  the  matrix  LBLT  will  differ  from  LBLT  in  the  A- part  and  the  zero-part, 


since 

(  2 

0 

2  ) 

f  0 

0 

°\ 

L(B  +  D)Lt  = 

0 

-2 

1 

+  <*22 

0 

1 

1 

o ) 

1° 

2 

w 

Because  of  the  lower  triangular  structure  of  L,  if  a  diagonal  block  of  B  is  altered,  the 
modification  may  alter  elements  of  the  permuted  K  with  greater  row  and  column 
numbers.  Consequently,  if  there  are  rows  of  A  with  greater  row  numbers,  there  is  a 
danger  of  altering  parts  of  K  other  than  the  H-part. 

It  may  appear  that,  by  analogy  with  the  unconstrained  case,  a  feasible  direction 
of  negative  curvature  may  be  obtained  by  solving  the  equation  LTIITu  =  v,  where 
v  is  an  eigenvector  of  B  corresponding  to  a  negative  eigenvalue  of  a  block  of  type 
H~  or  HH.  (Note  that  such  a  vector  v  always  exists  if  the  reduced  Hessian  has 
at  least  one  negative  eigenvalue,  since  Lemma  2.1  implies  that  K  has  more  than 
m  negative  eigenvalues.  Consequently,  since  there  are  at  most  m  block  elements  of 
type  A~ ,  HA  and  AA,  there  must  exist  a  block  element  of  type  H~  or  HH  for  K 
to  have  more  that  m  negative  eigenvalues.)  The  vector  d  is  then  defined  as  the  first 
n  components  of  u.  However,  the  ordering  of  the  rows  and  columns  of  K  given  by 
H  may  be  insufficient  to  guarantee  that  Ad  =  0.  In  our  example  matrix,  we  obtain 
d  =  (0  1)T  for  v  =  (0  1  0)T,  and  consequently  Ad  =  1^0. 

Conn  and  Gould  [CG84]  have  proposed  an  algorithm  for  finding  feasible  direc¬ 
tions  of  negative  curvature  based  on  the  LBLT- factorization  of  K .  In  addition  to 
the  original  factorization,  the  algorithm  requires  the  factorization  of  a  “triangular- 
like”  matrix  of  dimension  m  x  m.  In  this  paper,  we  adopt  a  different  view  and 
investigate  what  pivoting  strategy  is  necessary  and  sufficient  to  obtain  the  desired 
search  directions  in  a  single  factorization  of  K ,  using  L  and  B  in  an  analogous  way 
to  the  approach  of  More  and  Sorensen  [MS79]  for  the  unconstrained  case. 


4-  Sufficient  Pivot  Conditions 


9 


4.  Sufficient  Pivot  Conditions 


If  K  is  factorized  using  a  regular  LBLT- factorization,  the  permutations  are  per¬ 
formed  in  order  to  obtain  sparsity  of  the  factors  and  boundedness  of  ||£||.  In  the 
example  given  in  Section  3.4,  it  was  shown  that  these  permutations  may  be  insuffi¬ 
cient  for  computing  a  feasible  descent  direction  and  a  feasible  direction  of  negative 
curvature.  In  this  section,  we  investigate  what  conditions  may  be  imposed  on  the 
pivots  in  order  to  ensure  the  ability  to  compute  these  directions,  using  L  and  B, 
whenever  the  directions  exist. 

Upon  completion  of  the  the  factorization  of  K ,  we  have  computed  a  permutation 
matrix  77,  a  lower-triangular  matrix  L  and  a  block-diagonal  matrix  B  such  that 
LBLt  =  nTI<n.  We  will  consider  a  specific  step  in  the  factorization.  Therefore 
the  permutation  matrix  77  is  partitioned  as 

77  =  (  77,  772  ),  (4.1) 


where  the  matrix  n^KIIl  is  the  principal  submatrix  of  K  for  which  the  factors 
have  been  computed.  Consequently,  nxKnx  =  L\\B\L h,  where  L\\  and  B\  are 
the  leading  principal  submatrices  of  L  and  B,  respectively.  Note  that  at  this  step 
of  the  factorization,  the  matrix  772  has  not  yet  been  determined. 

The  matrix  nxl<nx  is  a  principal  submatrix  of  K,  and  we  introduce  a  permu¬ 
tation  77  for  which 


H1KH  = 


/ 

Hu 

Hu 

Al 1 

Al i 

\ 

Hi\ 

7/22 

An 

A  22 

An 

An 

0 

0 

\ 

42i 

A22 

0 

0 

/ 

of 

Htk 

77  are 

such 

that 

if 

Kn 


-( 


H  a 

An 


Ah 


(4.2) 


the  matrix  77^7('771  is  a  permuted  version  of  Kn.  The  matrix  Kn  is  a  principal 
submatrix  of  K  such  that  H\\  is  an  nj  x  nj  principal  submatrix  of  H ,  and  An  is 
an  rii  x  mi  submatrix  oi  A.  To  simplify  the  notation,  we  say  that  K\\  contains  n \ 
rows  of  77  and  mi  rcws  of  A. 

Since  nxKnx  denotes  the  part  of  K  that  has  been  factorized  at  a  particular 
step,  its  size  increases  from  step  to  step.  We  say  that  1J^KII1  and  the  associated 
A'n  are  expanded  when  one  step  in  the  factorization  is  performed.  For  example,  if 
in  one  step  of  the  factorization,  a  pivot  of  type  HA  is  selected,  the  size  of  HjKnx 
and  Kn  is  increased  by  two  and  both  rii  and  mx  are  increased  by  one. 

It  is  shown  in  this  section,  that  by  selecting  sufficiently  nonsingular  pivots  of  type 
H+ ,  A~  and  HA,  until  the  corresponding  Kn  contains  all  rows  of  A,  the  ability  to 
compute  a  feasible  descent  direction  and  a  feasible  direction  of  negative  curvature  is 
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achieved.  A  matrix  Kn  created  in  such  a  way  will  prove  crucial  for  the  computation 
of  descent  directions  and  directions  of  negative  curvature. 

Properties  of  IlfKIIi  that  we  want  to  describe  are  also  properties  of  Kn-  In 
particular,  the  eigenvalues  of  K\\  and  n^KIIl  are  identical.  We  consider  Ku 
since  its  properties  are  easier  to  describe.  The  following  lemma  gives  an  explicit 
representation  of  A'j'j1  . 

Lemma  4.1.  If  Kn  is  nonsingular,  then 

A^V1  _(  Z(ZTHnZ)~'ZT  Y  \ 

11  ”  \An  0  )  "  V  ~YTHuY  J  ’ 

«)here  Z  is  an  orthonormal  matrix  whose  columns  form  a  basis  for  the  null  space  of 
An,  and  the  matrix  Y  satisfies  the  matrix  equation 

(I:  ?)(.')-(:)■ 

where  A  =  YTHUY.  If  An  is  square,  the  matrix  Z{ZTHnZ)~xZT  is  to  be  inter¬ 
preted  as  a  zero  matrix  of  dimension  m\  x  m\. 

Proof.  Direct  substitution  in  (4.3)  shows  that  A  =  YTHnY.  Lemma  2.1  shows 
that  nonsingularity  of  Kn  yields  nonsingularity  of  ZTHnZ.  Thus,  it  remains  to 
verify  that 

(I:: 

This  holds  if  and  only  if  HnZ(ZTHnZ)~xZT+AiiYT  =  I.  If  An  is  square,  then  Y  — 
A^i,  (4.4)  holds  for  Z{ZTHnZ)~x ZT  =  0.  Otherwise,  (4.3)  gives  ZTHnY  =  0 
and  it  follows  that 

(%){  HnZ(ZTH„Z)-'ZT+Al,YT)  =  (fT)-  (4-*) 

The  proof  is  now  completed  by  showing  that  (  2  Y  )  is  nonsingular.  Assume  that 
Zvz  +  Yvy  =0.  Premultiplication  by  An  yields  vy  =  0.  Since  the  columns  of  Z  axe 
linearly  independent,  vz  =  0.  | 

The  following  lemma  shows  that  when  II^K II x  has  inertia  (ni,mi,0),  a  row  in 
nTK n / njKIIi  corresponding  to  a  row  of  A  cannot  be  almost  linearly  dependent 
on  the  other  rows  of  IITK II /IlfKIIi  unless  A  is  almost  rank-deficient. 


Lemma  4.2.  For  given  positive  constants  C\  and  M ,  let  K  be  any  KKT-matrix 
in  1C  with  a  principal  submatrix  Kn  such  that  In(Kn)  =  (ni,mi,0),  where  the 
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smallest  singular  value  of  K\\  is  greater  than  c\.  Define  a  matrix  S,  which  is  a 
permuted  version  of  the  Schur  complement  IITK 77/ IljK nx,  in  partitioned  form  as 


(  Sn  S12\(  H22  A&  \  _  /  H2X  A{2  \  (  Hu  Ajx  \  '  (  H12  a£  \ 

\  S2\  S22  J  \  A22  0  J  \  A2\  0  J  \  An  0  /  \  A\ 2  0  J 

If  mx  <  m,  there  exists  a  positive  constant  c2,  such  that  if 


efSnei  < 


2  Mn 


and 


(4.6a) 

(4.6b) 


for  some  i,  then 


zh^ 

where  Y  is  defined  in  (4-3)  and  (  is  any  nonnegative  scalar. 


(4.7) 


Proof.  Assume  that  mi  <  m  and  that  there  is  an  i  such  that  (4.6a)  and  (4.6b) 
hold  for  some  nonnegative  c.  Utilizing  the  notation  of  Lemma  4.1  we  obtain 

Su  =  H22  -  H2iZ(ZTHnZ)~lZTHn  -  Aj2YTH12  -  7721yA12  +  aJ2ythuyau 
Sn  =  A\2  -  H2lZ{ZTHnZy'ZTA &  -  Atx2YtaIx 
S2\  =  A22  —  A2XZ(Z^'HxxZ)  1  Z^H\2  ~  A21YAn 
S22  =  -  A2\Z(ZTHi\Z)~x  zta2X. 


The  definition  of  S22  yields 

~eJS22ei  =  ejA2iZ(ZTHuZ)-1ZTA%1ei. 

If  the  smallest  singular  value  of  Kxx  is  greater  than  a  positive  constant  cj  and 
In(Kn)  =  (ni,TOi,0),  Lemmas  2.1  and  2.2  guarantee  that  ZTHXXZ  is  positive 
definite  with  its  smallest  eigenvalue  greater  than  a  positive  constant.  By  the  assumed 
compactness  of  S(io),  the  elements  in  Hu  are  bounded  in  magnitude.  Consequently, 
there  exists  a  positive  constant  cx ,  such  that 

^  -efS22e,  >  cx\\ZTA^xei\\2,  (4.8) 

where  the  left-hand  inequality  follows  from  the  assumption  that  (4.6a)  holds.  Using 
the  definition  of  S2i  we  obtain 

efA22  -  efAuYAn  =  ejs2l  +  ejA2lZ(ZTHxxZyx  ZrHu. 

It  follows  from  this  equation,  (4.6b),  (4.8),  the  nonsingularity  of  Kn  and  the  bound¬ 
edness  of  ||7/||  that  there  exists  a  positive  constant  c2,  such  that 

||efA22  -  efA2iyAi2||  <  c2y/c. 


(4.9) 
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The  identities  A\ \Y  =  I  and  1AnAjlei  +  ZZ^^e,  yield 

ejA2iYAn  =  e?A21  -  ejA21ZZT+  ejA2lZZTYAu. 

Utilizing  (4.8)  we  obtain 

WefA2\  ~  e[A2iY /4n||  <  (4.10) 

for  some  positive  constant  C3.  Consequently,  (4.9)  and  (4.10)  yield  the  existence  of 
a  positive  constant  c2,  such  that 

t 

1 


In  the  following  lemma  we  use  the  full-rank  assumption  on  A  and  Lemma  4.2  to 
show  that  if  K\\  has  exactly  m\  (mi  <  m)  negative  eigenvalues  and  has  its  smallest 
singular  value  greater  than  a  positive  constant,  it  is  always  possible  to  expand  /fn 
so  that  mi  is  increased  by  one. 

Lemma  4.3.  For  a  given  positive  constant  c\,  let  K  be  any  KKT-matrix  in  K.  with 
a  principal  submatrix  Kn  such  that  In(Kn)  =  (ni,mi,0),  where  the  smallest  sin¬ 
gular  value  of  Kn  is  greater  than  C\.  Assume  that  the  factorization  of  II^K JIY 
is  known,  and  that  the  next  pivot  is  to  be  chosen  from  the  Schur  complement 
IITK II / n^KIIl.  If  mx  <  m,  there  exists  a  positive  constant  C3  such  that  there 
is  a  pivot  in  IITK II / II^K II x  of  type  A~  or  HA  whose  determinant  is  less  than  -C3. 

Proof.  Let  ci  be  a  given  positive  constant,  and  K  any  KKT-matrix  in  K  with 
a  principal  submatrix  K\\  such  that  In(Ku)  =  (^li^i^),  where  the  smallest 
singular  value  of  K\\  is  greater  than  ci .  Using  the  terminology  of  Lemma  4.2,  if  Kn 
has  inertia  (ni,mi,0)  and  the  smallest  singular  value  greater  than  ci,  Lemma  4.1 
and  the  definition  of  S  guarantee  the  existence  of  a  constant  M  such  that  j|S||  <  M. 
Let  ef(S2i  S22)  be  a  row  of  S  corresponding  to  a  row  of  A  and  let  l  denote  the 
corresponding  row  number  of  A.  Since  A  has  full  row  rank,  there  exists  a  positive 
constant  c  such  that 

lleM~  Y,aieTiAW  ^  1  (4-n) 

for  all  scalars  aj.  Given  ci  and  M,  let  c2  denote  the  constant  from  Lemma  4.2.  Let 
c  denote  the  positive  constant  defined  by  c2y/l  =  c/2.  For  this  choice  of  e,  at  least 
one  of 

-  eJS22ei  >  — and  (4.12a) 

ZMn 

lkrS2i||  >  v/i,  (4.12b) 

must  hold,  since  if  (4.12a)  and  (4.12b)  do  not  hold,  Lemma  4.2  implies  that  there 
exist  aj- s  such  that 

II eJA  ~  ^2<*jejA\\  <  c2VI, 


(4.13) 
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which  contradicts  (4.11).  It  is  now  shown  that  if  (4.12a)  holds,  there  exists  a  pivot 
of  type  A~  and  if  (4.12a)  does  not  hold  but  (4.12b)  holds  there  exists  a  suitable 
pivot  of  type  HA.  Assume  that  (4.12a)  holds.  Then  eJS22^i  is  a  pivot  of  type  A~ 
whose  determinant  is  less  than  -c/(2Mn).  Assume  that  (4.12b)  holds  but  (4.12a) 
does  not  hold.  If  (4.12b)  holds,  it  must  hold  that  n  -  nj  >0,  since  the  length  of  the 
row  vector  efS 21  is  n  —  n\.  Moreover,  there  must  exist  a  j  such  that 

\eJS2iej\  >  J  1  .  (4.14) 

y  n  —  Tii 

For  this  j,  let  P  denote  the  2x2  matrix  defined  as 

p_  (  eJSuej  efSi2e,  \ 

\  efS2\ej  efS22et  ) 

By  assumption,  (4.12a)  does  not  hold  and  consequently,  since  eJS22^i  <  0,  it  holds 
that  \eJS2ie%\  <  i/(2Mn).  This  assumption  and  the  inequality  | eJSnej\  <  M  yield 

det(P)  <  7^  -  \eJS2iej\2. 

It  follows  from  (4.14)  that  \eJS2\ej\  >  \Je/n,  and  consequently 

The  determinant  of  P  is  negative,  and  therefore  P  must  have  one  positive  and  one 
negative  eigenvalue.  Consequently,  P  is  a  pivot  of  type  HA  whose  determinant  is 
less  than  a  negative  constant.  Therefore,  there  exists  a  positive  constant  C3  for  which 
there  is  a  pivot  of  type  A~  or  HA  whose  determinant  is  less  than  -C3.  I 

Based  on  this  lemma,  the  factorization  algorithm  is  stated  in  Algorithm  4.1. 
Initially,  the  matrix  K\\  has  dimension  zero,  and  for  such  a  matrix  the  assumptions 
of  Lemma  4.3  hold.  It  follows  from  Lemma  4.3  that  there  exists  a  constant  pivot 
tolerance  tol  so  that  there  is  a  pivot  P  of  type  H+ ,  A~  or  HA  for  which  |  det(P)|  > 
tol.  K\\  is  expanded  using  this  pivot.  This  process  is  repeated  until  K n  contains 
all  rows  of  A.  When  K\\  contains  all  rows  of  A  it  is  considered  the  “final”  A'n- 
matrix,  and  the  remaining  Schur  complement  is  factorized  using  a  regular  LBLT- 
factorization. 

In  order  to  show  that  this  algorithm  is  well  defined,  it  is  essential  to  show  that 
the  required  properties  of  /fn  are  maintained  when  Kn  is  expanded.  The  following 
lemma  shows  that  this  is  true. 

Lemma  4.4.  For  a  given  positive  constant  cj,  let  K  be  any  KKT-matrix  in  fC  with 
a  principal  submatrix  K\\  such  that  In{K\\)  —  (ni,mi,0)  and  the  smallest  singular 
value  of  Kn  is  greater  than  c\.  Assume  that  the  factorization  of  nJ^K ni  is  known, 
and  that  the  next  pivot  P  is  of  type  H+,  A~  or  HA  and  has  |  det(P)|  >  C3,  where  C3 
is  a  positive  constant.  Then,  there  exists  a  positive  constant  c4  such  that  if  A'n  is 
expanded  using  P  as  pivot,  the  expanded  Kn  has  inertia  (ni,mi,0)  and  its  smallest 
singular  value  is  greater  than  c4. 
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Algorithm  4.1.  A  factorization  of  K. 

tol  <—  pivot  tolerance; 

repeat 

ex  <—  pivot  P  of  type  H+ ,  A~  or  HA  exists  with  |  det(P)|  >  tol ; 
if  ex  then 
pivot  <—  P; 

else 

Set  tol  to  the  largest  value  for  which  a  pivot  P  of  type  H* ,  A~  or 
HA  exists  with  |det(P)|  =  tol ; 
pivot  *-  P; 
end  if 

Expand  K\\  using  P  as  pivot; 

Update  mx  and  i»i; 
until  mx  =  m 

Factorize  the  remainder  of  K  using  a  regular  /.PL7- algorithm; 


Proof.  If  Kn  is  expanded  using  a  pivot  P  of  type  H+,  A~  or  HA,  the  expanded 
matrix  still  has  inertia  (nx,mj,0).  The  existence  of  c4  remains  to  be  established. 
Let  Kn  denote  the  expanded  A'n-matrix.  Since  the  smallest  singular  value  of  Ku 
is  greater  than  ci,  there  must  exist  a  positive  constant  ci  such  that  |  det(A'n)|  >  c\. 
It  follows  from  (3.3)  that  det(A'u)  =  det(jjfn)det(P).  If  |det(P)|  >  c3,  it  follows 
that  |det(A'n)|  >  cic3.  Consequently,  since  the  norm  of  K  is  bounded,  there  exists 
a  positive  constant  C4  such  that  the  smallest  singular  value  of  Ku  is  greater  than 
c4.  | 

This  lemma  shows  that,  since  A  has  full  row  rank,  a  A'n-matrix  with  inertia 
(ni,mi,0)  and  the  smallest  singular  value  greater  than  a  positive  constant,  may  be 
expanded  until  it  contains  all  rows  of  A  and  has  m  negative  eigenvalues.  Moreover, 
the  smallest  singular  value  of  the  final  Kn -matrix  is  bounded  away  from  zero.  Note 
also  that  the  pivot  tolerance  tol  is  bounded  away  from  zero  by  a  constant,  since  it 
is  greater  than  a  positive  constant  for  each  step  in  the  expansion  of  Kn- 

If  it  is  required  that  only  sufficiently  nonsingular  pivots  of  type  H+ ,  A~  and  HA 
are  utilized,  until  a  matrix  77x  containing  all  rows  of  A  is  created,  this  will 
impose  other  permutations  than  the  ones  provided  in  a  regular  algorithm  for  the 
LBLt- factorization.  Consequently,  it  is  essential  that  the  boundedness  of  the  norm 
of  the  L-matrix  is  maintained.  Since  the  pivots  of  the  associated  Bj-matrix  all  have 
singular  values  bounded  away  from  zero  by  a  constant,  the  following  lemma  shows 
that  the  boundedness  of  the  norm  of  L  is  maintained. 

Lemma  4.5.  Assume  that  77^A'/71  =  LnB\L\x,  with  its  smallest  singular  value 
greater  than  a  positive  constant.  Furthermore,  assume  that  ||Ln||  and  ||Z«2i||  are 
bounded  by  a  constant.  If  Kn  is  expanded  by  a  pivot  that  has  the  absolute  value  of 
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its  determinant  bounded  away  from  zero  by  a  positive  constant,  then  the  expanded 
L\\  and  L21  still  have  norm  bounded  by  a  constant. 

Proof.  Let  P  denote  the  pivot.  Since  K  and  (/717A'/71)-1  have  bounded  norm,  it 
follows  that  the  norm  of  IITK n/IljK 77j  is  bounded.  Consequently,  the  norm  of  P 
is  bounded  by  a  constant.  If  in  addition,  the  determinant  of  P  is  bounded  away  from 
zero  by  a  constant  in  absolute  value,  the  norm  of  P~l  is  bounded  by  a  constant. 
Therefore,  the  expanded  column  or  columns  of  L21  are  bounded  in  norm,  since  it 
follows  from  (3.3)  that  they  are  formed  by  postmultiplying  one  or  two  columns  of 

nTK n / iiiKnl  by  p-1.  i 

When  a  matrix  K\\  that  contains  all  rows  of  A  has  been  created  by  accepting 
pivots  of  type  77+,  A~  and  HA ,  as  described  in  Algorithm  4.1,  it  is  considered  the 
final  ifn-matrix.  For  the  rest  of  this  section,  the  matrix  K\\  is  assumed  to  be  such 
a  final  ifn-matrix.  The  factors  from  the  factorization  described  in  Algorithm  4.1 
are  also  assumed  to  be  known.  For  this  final  A'n-niatrix,  the  partition  of  77  from 
(4.1)  is  used.  This  partition  induces  a  partition  of  L,  B  and  HTKH  as 

(  njKn,  nlKih  W  i„  o  Ui,  o  \  (  i*  4,  \ 

V  njKn,  njKn ,  )  Ui  fe/l  •  Si/l  «  ) '  ' 

If  Ku  contains  all  rows  of  A,  all  blocks  in  Bi  correspond  to  pivots  of  type  77+, 
H~  or  HH.  This  means  that  modifications  of  Bi  only  alter  the  77-part  of  7v .  If  the 
factorization  of  K  given  in  Algorithm  4.1  is  known,  the  following  theorem  shows 
how  to  obtain  a  feasible  descent  direction. 

Theorem  4.1.  Let  K  be  any  KKT-matrix  in  1C.  Assume  that  the  LBLT -  factori¬ 
zation  of  K  given  in  Algorithm  4-1  is  known.  Assume  that  the  eigenvalue  decompo¬ 
sition  of  7?2  is  known,  as  B?  =  U  AUT.  For  a  positive  tolerance  ctol>  let  the  i-th 
diagonal  element  of  the  matrix  A,  denoted  by  A,,  be  defined  using  the  i-th  diagonal 
element  of  A,  denoted  by  Ai,  as  A,  =  max{|A, \,ctol),  and  let  B 2  =  U AUT.  Let 
B\  =  B\  and  let  B  denote  the  matrix  consisting  of  the  diagonal  blocks  B\  and  B2. 

Let  z  =  ( sT  p7)7,  where  s  is  an  n-vector  and  p  is  an  m-vector,  be  defined  as  z  =  775, 

where  z  satisfies  the  equation 


LBLTz  =  nT ^  jj. 

Then,  s  =  Zu  for  some  u,  and  there  exist  positive  constants  c\  and  C2,  such  that 

-gTZu>cx\\ZTg\\:i  and  \\ZTg\\>  c2\\u\\. 

Proof.  Since  the  matrix  Ku  has  its  smallest  singular  value  greater  than  a  pos¬ 
itive  constant,  the  compactness  of  S(x 0)  and  the  boundedness  of  ||L-1||  guaran¬ 
tee  that  ||£2||  is  bounded  by  a  constant.  Consequently,  B2  is  a  bounded  block- 
diagonal  matrix  whose  smallest  eigenvalue  is  greater  than  cjol-  Consequently,  since 
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In(B\)  =  (ni,m,0)  and  the  norm  of  L  is  bounded,  the  matrix  LBLT  has  inertia 
(n,m,0)  and  all  singular  values  bounded  away  from  zero  by  a  constant.  Since  all 
rows  of  A  are  contained  in  K\\,  the  modification  of  B2  only  alters  H.  Consequently, 


where  H  is  a  modification  of  H,  and  z  satisfies  the  equation 

from  which  it  follows  that  s  =  Zu  for  some  u  satisfying 

u  =  -(ZTHZ)~lZTg.  (4.16) 

Utilizing  the  boundedness  of  ||/7||,  premultiplication  by  gTZ  in  (4.16)  yields  the 
existence  of  a  positive  constant  cj,  such  that  —gTZu  >  Ci||ZI5||2.  Lemmas  2.1  and 
2.2  imply  that  ZTHZ  has  its  smallest  eigenvalue  greater  than  a  positive  constant. 
Consequently,  utilizing  norm  properties  and  the  boundedness  of  ||(ZTffZ)~1||,  (4.16) 
guarantees  the  existence  of  a  positive  constant  C2,  such  that  c2||ul|  <  ||Zr</||.  | 

Note  that  the  direction  s  computed  in  Theorem  4.1  is  the  Newton  direction 
whenever  B2  is  sufficiently  positive  definite,  i.e.,  when  the  reduced  Hessian  is  suf¬ 
ficiently  positive  definite.  Only  if  the  reduced  Hessian  is  not  sufficiently  positive 
definite  is  the  matrix  B  different  from  B. 

If  K\\  is  nonsingular,  contains  all  rows  of  A  and  has  m  negative  eigenvalues,  but 
K  has  more  than  m  negative  eigenvalues,  there  exist  feasible  directions  of  negative 
curvature.  Moreover,  since  In(K )  =  ln(njKIIx)  +  ln(IITK 77 / IljK nx),  it  must 
hold  that  IITK II / IIXK IIX  has  at  least  one  negative  eigenvalue,  and  consequently 
B2  must  have  at  least  one  negative  eigenvalue.  Because  of  the  structure  of  B2,  the 
smallest  eigenvalue  of  B2,  and  its  corresponding  eigenvector  are  readily  available. 
The  following  theorem  shows  that  the  factors  from  the  XHZ,7- factorization  of  K 
given  in  Algorithm  4.1  enable  the  computation  of  a  feasible  direction  of  negative 
curvature. 

Theorem  4.2.  Let  K  be  any  KKT-matrix  in  K  with  more  than  m  negative  eigen¬ 
values.  Assume  that  the  LBLT -factorization  of  K  given  in  Algorithm  4-1  is  known. 
Let  w  =  ( dT  x1)7,  where  d  is  an  n-vector  and  ir  is  an  m-vector,  be  defined  as 
w  =  77 ti>,  where  w  satisfies  the  equation 

<«.«) 

where  u\  is  an  eigenvector  of  unit  norm  corresponding  to  ^min(B2),  the  smallest 
eigenvalue  of  B2,  and  the  sign  is  chosen  so  that  gTd  <  0.  Then,  d  satisfies  d  =  Zv 
for  some  vector  v  which  is  bounded  in  norm,  and  there  exists  a  positive  constant  c 
such  that 

vtZtHZv<-c)^(ZtHZ). 
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Proof.  From  (4.17)  and  the  definition  of  w  we  obtain 

Kw  =  ±y/-Xmi a(B2)n  ^  lJB2Ux  )  •  (4.18) 

Since  K\\  contains  all  rows  of  A,  (4.18)  yields  Ad  =  0.  Consequently,  d  =  Zv  for 
some  v  and  it  follows  from  (4.17)  that 

vtZtHZv  =  xvtKw  =  -A  L„(52).  (4.19) 

The  boundedness  of  ||Z||  and  the  identity  L22B2LJ2  =  BJK II / IJjK II1  guarantee 
the  existence  of  a  positive  constant  c\ ,  such  that 

A„tf„ (B2)  <  cXrrani^Kn/n^Kn^  <  0.  (4.20) 

Since  K\\  has  inertia  (ni,m,0),  it  contains  a  nonsingular  m  X  m  submatrix  AB 
of  A.  Without  loss  of  generality,  we  may  assume  that  the  partition  of  A  is  such 
that  A  =  (  Ab  As  ),  where  AN  is  the  remaining  submatrix  of  A.  Let  Z  be  a 
reduced-gradient-type  null-space  matrix  generated  by  AB  defined  as 

z=(-'4y4"). 

Properties  of  TItK 77 / II^K II x  are  independent  of  the  pivoting  order  in  which  K\\  is 
created.  Without  loss  of  generality,  assume  that  A'n  was  created  by  first  factorizing 
the  2m  x  2m  principal  submatrix  of  K  containing  AB.  Then,  with  appropriate 
substitution  in  the  definition  of  S  in  Lemma  4.2,  it  follows  that  ZTH Z  is  a  permuted 
version  of  the  Schur  complement  of  this  2m  x  2m  principal  submatrix  in  nTKIJ . 
Moreover,  the  2m  x  2m  principal  submatrix  has  inertia  (m,  m,  0),  and  since  the 
inertia  of  (/7T/r/7//71rAr771  )  is  (ni,m,0),  it  must  hold  that  when  the  2m  x  2m 
principal  submatrix  has  been  processed,  the  remaining  n\  -  m  pivots  until  K\\  has 
been  processed  are  all  positive.  Hence,  [FGM89b,  Lemma  3.3]  implies  that 

Amin( /7 tk n / nji< nx)  <  x^n{zTHZ)  <  o.  (4.21) 

Since  there  exist  only  a  finite  number  of  nonsingular  m  x  m  submatrices  of  A ,  it 
may  be  asserted  that  there  exists  a  positive  constant  C2,  such  that 

A „UZtHZ)  <  c2Amin(ZT/7Z)  <  0.  (4.22) 

Combining  (4.19),  (4.20),  (4.21)  and  (4.22),  there  exists  a  positive  constant  c,  such 
that 

vTZTHZv  <  -c\mm{ZTHZ)2. 

Since  L  is  unit  lower-triangular,  boundedness  of  ||L||  implies  boundedness  of  ||L-1||. 
Since  all  block-diagonal  elements  in  B\  have  norm  greater  than  a  positive  constant, 
|| ^2 1|  is  bounded.  Consequently,  (4.17)  and  the  definition  of  v  guarantee  that  the 
norm  of  v  is  bounded.  | 
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The  strategy  for  choosing  pivots  given  in  this  section  allows  the  construction  of 
a  nonsingular  principal  submatrix  of  K  that  contains  all  rows  of  A  and  has  exactly 
m  negative  eigenvalues.  This  strategy  only  permits  pivots  of  type  H+,  A~  and  HA 
until  all  rows  of  A  have  been  processed.  However,  as  the  following  lemma  shows,  if 
the  full  Hessian  is  sufficiently  positive  definite,  the  pivot  strategy  described  in  this 
section  is  of  no  impact  at  all,  since  all  pivots  in  such  circumstances  are  of  type  H +, 
A~  and  HA. 

Lemma  4.6.  If  H  is  positive  definite  and  A  has  full  row  rank,  then  the  only  pivots 
that  occur  in  an  LBLT- factorization  of  K  are  of  type  H+,  A~  or  HA. 

Proof.  Since  K  is  nonsingular,  no  pivots  of  type  H°  or  A0  are  used.  The  positive 
definiteness  of  H  implies  that  any  principal  submatrix  of  H  is  positive  definite. 
Consequently,  any  nonsingular  principal  submatrix  K\\  has  inertia  (r»i,mi,0).  This 
property  cannot  hold  if  pivots  of  type  H~,  A* ,  HH  or  AA  are  used.  | 

The  conclusion  of  the  lemma  does  not  hold  if  the  Hessian  is  not  positive  definite. 
Consequently,  the  strategy  described  in  this  section  may  require  permutations  in 
addition  to  those  required  for  a  regular  LPZ,r-algorithm.  For  example,  a  pivot  of 
type  H~  would  not  be  accepted  unless  all  rows  of  A  have  been  processed.  From 
the  point  of  view  of  sparsity,  these  additional  permutations  are  undesirable  since  we 
wish  to  minimize  the  change  to  the  ordering  determined  by  the  analyze  phase. 

One  possible  scheme  for  choosing  the  pivots  of  type  H+,  A~  and  HA  is  to  use  m 
pivots  of  type  HA.  Such  a  scheme  has  been  proposed  by  Gill  et  al.  [GMSW87b]  in 
the  context  of  large-scale  quadratic  programming.  They  name  the  HA  pivot  a  tile 
and  refer  to  this  scheme  as  tiling. 

Finally,  if  the  problem  is  unconstrained,  the  matrix  K  equals  H,  and  the  pivot 
strategy  described  in  this  section  equals  the  strategy  of  a  regular  L  J9£r-factorization, 
since  if  A  does  not  exist,  all  rows  of  A  are  processed  at  the  very  first  step  of  the 
factorization.  The  computed  directions  are  then  equivalent  to  those  computed  by 
More  and  Sorensen  [MS79]. 

5.  Necessary  Pivot  Conditions 

In  the  previous  section  it  was  shown  that  a  sufficient  condition  for  the  computation 
of  a  descent  direction  and  a  direction  of  negative  curvature  is  to  use  pivots  of  type 
//+,  A~  and  HA  until  all  rows  of  A  have  been  processed.  This  scheme  creates  a 
principad  submatrix  nfKni  which  contains  all  rows  of  A,  has  inertia  (ni,m,0)  and 
its  smallest  singular  value  is  greater  than  a  positive  constant.  However,  it  may  also 
be  necessary  to  obtain  such  a  principal  submatrix  in  order  to  compute  a  feasible 
descent  direction  and  a  feasible  direction  of  negative  curvature  as  in  the  scheme 
of  More  and  Sorensen  [MS79].  In  the  example  of  Section  3.4,  since  77  =  /,  the 
only  principal  submatrix  obtained  in  the  factorization  that  contains  all  rows  of  A 
is  K  itself.  If  a  block  of  type  A~ ,  AA  or  A0  is  modified  then  the  zero-part  of  K  is 
altered  and  the  subsequent  search  direction  is  not  feasible.  Hence,  in  the  example 
of  Section  3.4,  the  only  part  of  B  that  may  be  modified  is  the  second  element. 
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However,  it  was  shown  that  such  a  modification  altered  both  the  A-part  and  the 
zero-part  of  K .  Similarly,  when  computing  the  direction  of  negative  curvature  using 
an  eigenvector  of  B  as  a  right-hand  side  vector,  only  eigenvectors  corresponding  to 
blocks  of  type  H~  and  HH  may  be  used.  Again,  the  example  of  Section  3.4  shows 
that  solving  for  the  eigenvector  corresponding  to  the  second  element  of  B  does  not 
yield  a  feasible  descent  direction. 

It  was  shown  in  Section  4  that  a  sufficient  condition  for  obtaining  a  nonsingular 
principal  submatrix  containing  all  rows  of  A  with  m  negative  eigenvalues  was  to 
use  pivots  of  type  H+ ,  A~  and  HA  until  all  rows  of  A  had  been  processed.  Such 
a  scheme  may  appear  unduly  restrictive.  However,  in  this  section  it  is  shown  that 
these  conditions  are  also  necessary  in  order  to  guarantee  the  ability  to  obtain  such  a 
principal  submatrix  in  a  single  factorization.  We  utilize  the  notation  of  Section  4  and 
let  nfKn^  denote  a  leading  principal  submatrix  of  nTKn ,  for  which  the  factors 
Lu  and  B\  are  known.  Also,  to  simplify  the  notation,  the  related  matrix  A'n  from 
(4.2)  is  used. 

The  following  two  lemmas  show  that  unless  the  inertia  of  nfK /7j  is  kept  equal 
to  (nj,mi,0)  until  all  rows  of  A  have  been  processed,  it  may  happen  that  no  non¬ 
singular  principal  submatrix  with  m  negative  eigenvalues,  containing  K\\  and  all 
rows  of  A,  exists.  Consequently,  the  results  from  Section  4  are  not  applicable  and 
there  is  no  guarantee  that  we  are  able  to  compute  descent  directions  and  directions 
of  negative  curvature  as  described  in  Section  4. 

Lemma  5.1.  Let  M  denote  anmxn  matrix  of  full  row  rank,  where  m  <  n.  Assume 
that  a  partition  of  M  is  given  such  that 


where  Mi  has  full  row  rank.  Given  a  nonzero  vector  x  such  that  M\x  =  0,  M2  may 
be  such  that  Mx  =  0. 

Proof.  If  1  ^  0  and  M\x  =  0,  it  follows  that  the  rows  of  M\  are  orthogonal  to  x. 
The  null  space  of  xT  has  dimension  n  —  1.  Consequently,  if  m  <  n,  the  rows  of  M 2 
may  be  orthogonal  to  1  | 


Lemma  5.2.  Assume  that  n  >  m.  If  ln(I\n)  =  (ni -t,mx  +  <,0),  where  t  >  0,  the 
matrix  A  may  be  such  that  any  nonsingular  principal  submatrix  of  K ,  containing 
K\\  and  all  rows  of  A,  has  more  than  m  negative  eigenvalues,  independently  of  Hu, 
H21  and  H22 • 

Proof.  If  t  >  0,  Lemma  2.1  implies  that  there  exists  a  nonzero  vector  pi  such  that 
pjHi\pi  <  0  and  AnPi  =  0.  If  n  >  m,  Lemma  5.1  shows  that  Au,  A21  and  A22 
may  be  such  that  A21P1  =  0  in  which  case 

/  An  A12  \ 

\  A2i  A22  /  \  0  /  \  0  / 


and 
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Consequently,  Lemma  2.1  implies  that  K  has  more  than  m  negative  eigenvalues.  If 
any  nonsingular  principal  submatrix  of  K  containing  K\\  and  all  rows  of  A  is  con¬ 
sidered,  by  extending  the  vector  p\  with  the  appropriate  number  of  zero  elements,  it 
follows  that  such  a  principal  submatrix  has  more  than  m  negative  eigenvalues.  This 
property  depends  only  on  the  matrices  A  and  An,  and  therefore  it  is  independent 
of  the  matrices  Hu,  H^x  and  H22.  I 


The  following  example  illustrates  that  even  when  the  reduced  Hessian  is  positive 
semidefinite,  if  it  is  singular,  then  it  may  be  necessary  to  maintain  a  An  with  the 
number  of  negative  eigenvalues  equal  to  mi  for  Theorem  4.1  to  apply.  Let 

0  )  and  A  =  (  /  0  )  ,  (5.1) 

for  which  the  reduced  Hessian  is  zero.  If  the  corresponding  A-matrix  is  factorized, 
the  first  pivot  must  be  either  of  type  HA  or  HH .  If  an  element  of  type  HH  is  chosen 
as  first  pivot,  it  is  not  possible  to  obtain  a  nonsingular  principal  submatrix  of  K, 
containing  this  pivot  and  having  as  many  negative  eigenvalues  as  it  contains  rows  of 
A.  Consequently,  Theorem  4.1  does  not  apply.  For  this  situation,  when  the  reduced 
Hessian  is  positive  semidefinite  and  singular,  Conn  and  Gould  [CG84]  have  shown 
how  to  compute  a  descent  direction  from  a  single  factorization  of  K.  That  scheme 
is  not  considered  here,  since  it  has  not  been  shown  that  the  descent  direction  given 
by  that  scheme  satisfies  the  conditions  of  Theorem  4.1.  In  practice,  however,  the 
scheme  of  Conn  and  Gould  [CG84]  may  be  a  viable  alternative. 

On  the  other  hand,  it  is  of  interest  to  detect  as  early  as  possible,  if  K  has  inertia 
different  from  (n,m,0).  However,  the  next  lemma  shows  that  given  any  nonsingular 
principal  submatrix  A'n  with  no  more  than  m  negative  eigenvalues,  it  may  hold 
that  A  is  nonsingular  and  has  m  negative  eigenvalues. 

Lemma  5.3.  If  In(Kn)  =  (ni  -  t,m\  +  t, 0),  where  m\  +  t  <  m,  it  may  hold  that 
In(K)  =  (n,m,0). 

Proof.  If  In(Ku)  =  (nj  -  t,mj  +  t,0),  where  mi  +  t  <  m,  it  may  hold  that 
An  can  be  expanded  with  all  rows  of  H  so  that  Jn  (An)  =  (n  —  f,n»i  + 1,0),  where 
mi  +  t  <  mi  +t  <  m.  Consequently,  if  we  let  Z  denote  a  matrix  whose  columns  form 
an  orthonormal  basis  for  the  null  space  of  the  rows  of  A  contained  in  the  expanded 
An,  Lemma  2.1  yields  In(ZTHZ)  =  (n  -  mi  -  t,f,0).  Let  v,,  i  =  l,...,n  -  mi, 
denote  orthonormal  eigenvectors  corresponding  to  eigenvalues  Aj  of  ZTHZ,  where 
the  eigenvectors  are  sorted  so  that  Aj  <  A*+i.  If  the  remaining  rows  of  A  equal 
vJZT,  i  =  1, . . . ,  m  -  mi,  then  A  has  full  row  rank  and  pTHp  >  0  for  all  p  ^  0  such 
that  Ap  =  0.  Consequently,  Lemma  2.1  yields  In{K )  =  (n,m,0).  I 


6.  A  Sufficient  Pivoting  Method 


21 


6.  A  Sufficient  Pivoting  Method 

If  the  sufficient  pivot  conditions  described  in  Section  4  are  applied  when  factoriz¬ 
ing  K,  descent  directions  and  directions  of  negative  curvature  may  be  computed  as 
described  in  Theorems  4.1  and  4.2.  In  particular,  the  computed  directions  satisfy 
sufficiency  requirements  for  applying  known  results  for  unconstrained  minimiza¬ 
tion.  In  this  section  we  review  two  different  methods  and  apply  them  to  our  linear 
equality-constrained  problem.  The  first  method  is  a  curvilinear  linesearch  method 
and  the  second  is  a  regular  linesearch  method.  In  the  discussion,  at  a  particular 
iterate  k,  the  methods  use  a  feasible  descent  direction  sk  and  a  feasible  direction  of 
negative  curvature  dk.  The  direction  dk  always  satisfies  djgk  <  0.  If  no  feasible  de¬ 
scent  direction  exists,  s k  is  to  be  interpreted  as  a  zero  vector.  Similarly,  if  no  feasible 
direction  of  negative  curvature  exists,  d k  is  to  be  interpreted  as  a  zero  vector. 

6.1.  A  curvilinear  linesearch  method 

In  the  method  for  unconstrained  minimization  suggested  by  More  and  Sorensen 
[MS 79],  the  next  iterate  xk+\  is  determined  by  searching  along  an  arc  emanating 
from  xk,  defined  by  the  univariate  function 

<t>k(a)  =  f(xk  +  <*2*k  +  <*dk),  (6.1) 

where  sk  is  a  descent  direction  and  dk  is  a  direction  of  negative  curvature.  In  this 
subsection,  it  is  assumed  that  sk  is  computed  as  in  Theorem  4.1.  It  is  also  assumed 
that  dk  is  computed  as  in  Theorem  4.2  whenever  ZTHkZ  has  at  least  one  negative 
eigenvalue.  If  ZTHkZ  has  no  negative  eigenvalues,  then  dk  =  0. 

Let  p.  6  (0,1),  rj  e  [/z,  1)  and  /?  >  0  denote  preassigned  constants.  The  scalar 
ak  G  (0,  /?]  is  determined  in  such  a  way  as  to  attempt  to  satisfy 

</>*(<**)  <  &(0)  +  \p4>k{H)a\  and  (6.2a) 

<t>'kM  >  *?(<&(»)  +  </4'(0)«fc).  (6.2b) 

In  practice,  condition  (6.2b)  is  not  always  satisfied  by  q^.  For  a  complete  discussion 
on  how  to  generate  ak  we  refer  to  the  original  paper  of  More  and  Sorensen  [MS79]. 
The  next  iterate  ifc+i  is  defined  as 

xk+i  =  xk  +  aksk  +  akdk.  (6.3) 

The  vectors  sk  and  dk  are  both  feasible,  and  we  may  write  sk  =  Zuk  and 
dk  =  Zvk  for  some  vectors  uk  and  vk.  Consequently,  utilizing  uk  and  vk,  the 
problem  is  unconstrained.  The  following  lemma  shows  that  uk  and  vk  satisfy  the 
requirements  of  More  and  Sorensen  [MS79]. 

Lemma  0.1.  Let  denote  an  infinite  sequence  of  points  in  S(x  o).  For  each 

such  point  xk,  assume  that  sk  is  computed  as  in  Theorem  4-1  and  assume  that  dk  is 
computed  as  in  Theorem  4-2  whenever  ZTHkZ  has  at  least  one  negative  eigenvalue , 
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and  that  dk  =  0  otherwise.  Then,  it  holds  that  sk  =  Zu k  for  some  uk  and  dk  =  Zvk 
for  some  vk  such  that 

gkZuk  — ►  0  implies  ZTgk  —*  0  and  Uk  — *  0 


and 

vkZTHkZvk  — *  0  implies  A^  — *  0  and  <4  — 1 ►  0, 
where  A*  is  the  minimum  of  zero  and  Amj n(ZTHkZ). 

Proof.  Theorem  4.1  implies  that  sk  =  Zu k  for  some  u*.  Assume  that  gJZuk  -*  0. 
Theorem  4.1  implies  that  ZTgk  — >  0  and  uk  — »  0. 

Theorem  4.2  implies  that  <4  =  Zvk  for  some  vk.  If  Xmin(ZTHkZ)  >  0  for  some 
k,  then  Vk  =  0.  Consequently,  without  loss  of  generality,  it  may  be  assumed  that 
■^min (ZTHkZ)  <  0  for  all  k.  Assume  that  vJZTHkZvk  — ►  0.  Theorem  4.2  implies 
that  Amin (ZTHkZ)  — ►  0.  Using  the  assumed  boundedness  of  ||£||,  it  follows  from 
(4.17)  and  (4.19)  that  there  exists  a  positive  constant  c  such  that 

\\Zvk\\  <  c(-vlZTHkZvk )1'*. 

Hence,  since  the  columns  of  Z  are  linearly  independent,  vJZTHkZvk  — ►  0  implies 
that  vk  — ►  0,  as  required.  | 

Consequently,  using  Lemma  6.1,  the  results  of  More  and  Sorensen  [MS79]  may 
be  applied  directly. 

Theorem  6.1.  (More  and  Sorensen  [MS79])  If  an  infinite  sequence  {xjt}£L0  is 
generated  as  defined  in  (6.3),  any  limit  point  x  satisfies 

ZTVf(x)  =  0  and  X^n{ZTV2f{x)Z)>Q. 

Proof.  See  More  and  Sorensen  [MS79,  Theorem  6.2).  | 

6.2.  A  regular  linesearch  method 

The  method  proposed  by  Forsgren  et  al.  [FGM89b]  is  a  method  that  uses  regular 
linesearch  at  all  iterates.  In  order  to  modify  this  unconstrained  method  to  a  method 
for  linear  equality-constrained  problems  utilizing  the  £H£T-factorization,  a  positive 
constant  e  is  given  and  the  direction  of  negative  curvature  dk  is  computed  as  in  The¬ 
orem  4.2,  but  <4  =  0  if  d[Hkdk  >  — c.  Note  that  djHkdk  is  available  without  needing 
to  compute  <4,  since  Theorem  4.2  yields  d%Hkdk  =  -AjCn(i4).  The  direction  sk  is 
assumed  to  be  computed  as  in  Theorem  4.1. 

At  iteration  k,  the  search  direction  pk  is  defined  to  be  a  linear  combination  of 
the  descent  direction  sk  and  the  direction  of  negative  curvature  <4,  as 


Pk  =  sk  +  /4«4, 
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where  the  scalar  fik  is  zero  if  d k  =  0,  and  it  is  chosen  so  that  pj/f^p*  <  dTkHkik 
otherwise.  We  refer  to  Forsgren  et  al.  [FGM89b]  for  details. 

Let  p  and  7  denote  preassigned  constants  such  that  p  €  (0,^)  and  7  G  (0,1). 
Given  xk  and  a*  G  [<*nvint<*mAx])  where  cumin  and  amax  are  positive  constants,  the 
number  i*  is  defined  to  be  the  smallest  nonnegative  integer  i  such  that 

f(xk  +  7 '(XkPk)  <  f(xk)  +  P7 '(*k9kPk  if  4  =  0;  (6.4a) 

f(xk  +  7 '(*kPk)  <  /(**)  +  P7'«jt 9kPk  +  ~  72  akPkHkPk  if  4  #  0.  (6.4b) 

The  next  iterate  zt+i  is  defined  as 

ifc+i  =  lit  +  3,kakpk.  (6.5) 

With  sk  and  dk  computed  from  Theorems  4.1  and  4.2,  pk  satisfies  Apk  =  0  and 
we  may  write  pk  =  Zvk  for  some  vk.  The  analogy  with  the  unconstrained  case 
is  clearer  if  this  representation  is  used.  The  convergence  of  this  linesearch  follows 
from  the  convergence  analysis  of  Kaniel  and  Dax  [KD79].  The  reduced  gradient  is 
zero  the  smallest  eigenvalue  of  the  reduced  Hessian  is  greater  than  a  small  negative 
number  at  all  limit  points. 

Theorem  6.2.  If  an  infinite  sequence  {xjt}fcio  ,s  generated  as  defined  in  (6.5),  any 
limit  point  x  satisfies 

ZTV/(x)  =  0  and  Amin(ZrV2/(i)Z)  >  -cfe, 
where  c  is  a  positive  constant. 

Proof.  Since  pjHkpk  <  d%Hkdk  <  -e  whenever  dk  /  0,  it  follows  from  the 
convergence  analysis  of  Kaniel  and  Dax  [KD79]  that  there  must  exist  a  finite  I 
such  that  dk  =  0  for  ail  k  >  I.  Moreover,  it  follows  from  the  convergence  anal¬ 
ysis  of  Kaniel  and  Dax  [KD79]  that  Theorem  4.1  gives  a  pk  =  Zuk  such  that 
limfc_oo  ZTgk  =  0.  Theorem  4.2  guarantees  the  existence  of  a  positive  constant 
c  such  that  Ami n(ZTHkZ)  >  —  Cy/l  for  all  k  >  I.  The  result  now  iollows  from  the 
continuity  of  V2/.  | 


7.  Artificial  Constraints 

In  Section  4  we  discussed  what  conditions  may  be  imposed  upon  the  pivots  in 
order  to  ensure  the  ability  to  compute  descent  directions  and  directions  of  negative 
curvature  from  a  single  factorization  of  K .  An  alternative  strategy  for  yielding  a 
descent  direction  or  a  direction  of  negative  curvature  would  be  to  factorize  K  using 
a  regular  Z5Z,T-factorization,  and  from  the  inertia  of  B  deduce  the  inertia  of  the 
reduced  Hessian.  If  the  reduced  Hessian  has  at  least  one  negative  eigenvalue,  an 
artificial  constraint  may  be  added  to  A,  so  that  the  number  of  positive  eigenvalues 
of  K  is  increased  by  one,  and  consequently  the  number  of  negative  eigenvalues  of  the 
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reduced  Hessian  is  reduced  by  one.  An  artificial  constraint  is  a  temporary  additional 
constraint,  that  is  not  specified  in  the  original  problem.  The  only  requirement  for 
an  artificial  constraint  is  linear  independence  from  the  original  constraints  and  other 
artificial  constraints.  Artificial  constraints  do  not  restrict  the  feasible  region,  since 
they  are  only  introduced  at  a  particular  iterate,  and  may  be  removed  from  the 
problem  at  any  iterate.  If  a  scheme  for  adding  artificial  constraints  is  known,  a 
positive  definite  reduced  Hessian  could  be  obtained  by  adding  a  sufficient  number 
of  such  constraints.  Unless  the  dimension  of  the  reduced  Hessian  has  been  reduced 
to  zero,  a  descent  direction  can  be  obtained.  A  method  for  computing  a  direction  of 
negative  curvature  for  a  positive-definite  reduced  Hessian  in  the  presence  of  artificial 
constraints  has  been  proposed  by  Forsgren  et  at.  [FGM89a]. 

However,  as  the  following  lemma  shows,  to  find  an  artificial  constraint  that  re¬ 
duces  the  number  of  negative  eigenvalues  of  the  reduced  Hessian  by  one  is  equivalent 
to  finding  a  direction  of  negative  curvature  in  the  null  space  of  A.  Consider  the  case 
when  a  nonsingular  KKT-matrix 


K  = 


( 


H 

A 


At 

0 


is  given,  where  K  has  more  than  m  negative  eigenvalues.  The  question  of  finding 
an  additional  artificial  constraint  a  such  that 


(  H  At  a\ 


I<  = 


0  0 
0  0/ 


has  one  more  positive  eigenvalue  than  K  is  equivalent  to  finding  a  direction  of  neg¬ 
ative  curvature  for  the  reduced  Hessian  corresponding  to  K.  The  precise  statement 
is  given  in  the  following  lemma,  and  uses  the  solution  of  the  equation 


(7.1) 


Lemma  7.1.  If  In(K)  =  In(K)  -f  (1,0,0),  then  p  from  (7.1)  is  a  direction  such 
that  Ap  =  0  and  jPHp  <  0.  Conversely,  if  q  is  a  direction  such  that  Aq  =  0  and 
qTHq  <  0,  then  In(K)  =  In(K)  +  (1,0,0)  for  a  =  Hq. 

Proof.  Assume  that  In(K)  =  In( K)  +  (1,0,0).  Let  wT  =  ( aT  0)  and  let  uT  =  ( pT 
pT).  It  follows  that 


and  that  u  solves  the  equation  Ku  =  w.  Sylvester’s  law  of  inertia  implies  that 
wTK~lw  <  0.  Using  the  identity  Ku  =  w ,  it  follows  that  uTKu  <  0.  Consequently, 
(7.1)  yields  pTHp  =  uTKu  <  0  and  Ap  =  0. 
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Assume  that  q  is  a  vector  such  that  Aq  =  0  and  qTHq  <  0.  Let  uT  =  (  qT  0) 
and  w  =  Ku.  We  get  wTK~lw  =  qTHq  <0.  If  a  =  Hq,  it  follows  that  wT  =  (aT 
0).  Sylvester’s  law  of  inertia  implies  that  In(R)  =  In(K)  +  (1,0,0).  | 

Consequently,  the  ability  to  add  an  artificial  constraint  is  linked  with  the  ability 
to  compute  a  direction  of  negative  curvature  in  the  null  space  of  A.  As  an  example, 
consider  the  case  when  no  constraints  exist,  and  H  =  I  —  eeT,  where  e  is  an  n- vector 
with  all  components  one.  This  matrix  has  one  negative  eigenvalue  and  n-  1  positive 
eigenvalues.  If  less  than  n  artificial  bounds  are  added  to  H ,  the  corresponding 
A'-matrix  will  have  a  reduced  Hessian  that  is  not  positive  definite.  However,  if 
the  single  artificial  constraint  eT  is  added,  the  corresponding  reduced  Hessian  is 
positive  definite.  Consequently,  although  there  exist  artificial  constraints  to  add, 
we  do  not  know  how  to  compute  them  directly.  Conn  and  Gould  [CG84]  have 
given  a  computational  scheme  for  obtaining  a  direction  of  negative  curvature  from 
the  L B A^-factors  of  K.  However,  this  scheme  requires  the  solution  of  a  system  of 
equations  with  an  m  X  m  “triangular-like”  matrix. 

8.  A  Descent  Method 

If  no  attempt  is  made  to  avoid  altering  ti.e  nhT-n.atrix  when  the  reduced  Hessian 
is  positive  definite,  we  may  cens'cer  an  algorithm  that  yields  a  descent  direction 
in  a  single  factorization.  When  forming  the  factors,  pivots  corresponding  to  ele¬ 
ments  of  H  are  modified,  if  necessary,  so  that  the  principal  submatrix  factorized  at 
each  step  is  nonsingular  and  n<»^  as  ma.iy  positive  eigenvalues  as  it  contains  rows 
of  H .  This  modification  corresponds  to  adding  to  the  diagonal  of  H ,  and  may  be 
expressed  as  a  positive  semidefinite  diagonal  matrix  D  of  the  same  dimension  as  H. 
If  the  pivots  are  modified  so  that  the  factorized  principal  submatrix  has  its  small¬ 
est  singular  value  bounded  away  from  zero  by  a  constant,  this  yields  a  correction 
matrix  D  with  bounded  norm  such  that  ZT(H  +  D)Z  is  positive  definite  and  has 
its  smallest  eigenvalue  bounded  away  from  zero  by  a  constant.  Such  a  correction 
can  be  computed  in  a  single  factorization  and  requires  only  the  permutations  of  a 
regular  Z2?Ar-algorithm.  However,  in  this  method,  the  correction  matrix  D  may  be 
substantial  even  if  the  reduced  Hessian  is  positive  definite.  Therefore,  there  is  no 
guarantee  that  this  method  has  the  same  rate  of  convergence  as  Newton’s  method. 
For  example,  if  the  original  ordering  of  K  is  used  in  the  factorization,  this  method 
will  modify  H  so  that  H  +  D  is  positive  definite.  Consequently,  if  ZTH  Z  is  posi¬ 
tive  definite,  but  not  H,  the  KKT-matrix  is  modified  unnecessarily.  On  the  other 
hand,  if  initially  m  pivots  of  type  HA  are  chosen,  H  will  be  modified  only  if  the 
reduced  Hessian  is  not  sufficiently  positive  definite.  However,  the  ordering  of  the 
latter  case  is  such  that  the  second-order  method  described  in  Section  6  would  not 
require  additional  permutations. 

Given  the  modified  A'-matrix,  a  descent  direction  p  may  be  obtained  by  solving 
the  equation 


(8.1) 
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Upon  termination  of  the  factorization  algorithm  described  in  this  section,  the 
Zf?LT- factors  of  the  left-hand  side  matrix  of  (8.1)  are  known,  and  the  search  direc¬ 
tion  p  may  be  obtained  from  these  factors.  Since  ZT(H  +  D)Z  is  positive  definite 
with  bounded  norm  and  smallest  eigenvalue  greater  than  a  positive  constant,  the 
search  direction  p  from  (8.1)  is  a  sufficient  descent  direction.  Utilizing  some  inex¬ 
act  linesearch  procedure,  for  example  the  ones  described  in  Section  6  for  the  case 
d  =  0,  it  follows  that  the  reduced  gradient  is  zero  at  all  limit  points  of  a  generated 
sequence. 

9.  A  Delta-Method 

Assume  that  for  a  fix  positive  constant  6 ,  a  search  direction  p  is  computed  at  each 
iteration  from  the  equation 

(**!r 

where  D  is  a  positive  semidefinite  matrix  with  bounded  norm  and  the  sign  of  the 
right-hand  side  vector  is  chosen  so  that  gTp  <  0.  The  matrix  D  has  all  elements 
zero  when  the  reduced  Hessian  is  sufficiently  positive  definite.  Nonzero  elements 
in  D  arise  for  two  reasons.  Firstly,  to  ensure  that  the  smallest  singular  value  of 
the  left-hand  side  matrix  of  (9.1)  is  bounded  away  from  zero.  Secondly,  if  having 
formed  an  initial  factorization  there  are  more  than  m  negative  eigenvalues,  we  may 
alter  B,  where  possible,  to  reduce  the  number  of  negative  eigenvalues.  The  constant 
6  may  be  chosen  numerically  small,  for  example  in  the  order  of  the  square-root  of 
the  machine  precision.  Ideally,  the  ordering  in  the  factorization  of  the  modified  K- 
matrix,  where  61  is  added  to  H,  may  then  be  kept  to  whatever  is  satisfactory  for 
preserving  sparsity.  However,  as  was  pointed  out  by  the  example  (5.1),  the  Schur 
complement  may  be  singular,  and  there  may  be  no  rows  of  H  left,  so  equation  (9.1) 
cannot  be  guaranteed  to  be  solved  using  a  single  factorization.  However,  if  it  is 
determined  that  a  single  factorization  is  impossible,  by  refactorizing,  utilizing  the 
factorization  method  described  in  Section  8  ,  we  may  always  guarantee  a  positive 
semidefinite  correction  matrix  D  such  that  the  left-hand  side  matrix  of  (9.1)  is 
sufficiently  removed  from  a  singular  matrix.  In  practice,  we  may  hope  that  a  small 
value  of  6  would  not  impact  the  rate  of  convergence  compared  with  Newtons  method 
in  the  neighborhood  of  a  local  minimizer  where  the  reduced  Hessian  is  positive 
definite.  Moreover,  it  may  be  expected  that  the  event  that  a  single  factorization  is 
impossible,  due  to  singularity  arising  when  factorizing  the  left-hand  side  matrix  of 
(9.1),  is  rare. 

The  following  lemma  shows  that  unless  ZTg  =  0,  the  search  direction  from  (9.1) 
is  a  nontrivial  descent  direction  or  a  nontrivial  direction  of  negative  curvature. 

Lemma  9.1.  The  vector  p  from  (9.1)  satisfies  p  =  Zv  for  some  v.  It  holds  that 
vTZTg  <  0  and  at  least  one  of  the  following  conditions  is  satisfied: 

vTZTg  <  -6-vtZtZv 
*  -  2 


(9.2a) 
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vtZtHZv  <  -6-vtZtZv.  (9.2b) 

Proof.  Since  Ap  =  0  it  follows  that  p  =  Zv  for  some  v.  The  sign  of  p  is  always 
chosen  so  that  vTZTg  <  0.  Utilizing  (9.1)  we  obtain 

vtZtHZv  <  -vTZTg  -  6vtZtZv  -  vTZTDZv.  (9.3) 

Assume  that  (9.2a)  does  not  hold.  Since  D  is  positive  semidefinite,  (9.3)  yields 

vtZtHZv  <  -  6-vtZtZv, 
and  consequently  (9.2b)  holds.  | 

If  the  same  linesearch  as  for  the  method  of  Section  6.2  is  applied,  we  can  show 
that  the  reduced  gradient  is  zero  at  all  limit  points.  Let  p  and  7  denote  preassigned 
constants  such  that  p  €  (0,  ^)  and  7  6  (0, 1).  Given  Xk  and  a*  6  [amin,<*miix],  where 
<*min  and  amax  are  positive  constants,  the  number  ik  is  defined  to  be  the  smallest 
nonnegative  integer  i  such  that 

f(xk  +  l'akpk)  <  f{xk)  + pYak9kPk  if(9.2a)holds;  (9.4a) 

„2  2»Q2 

f{xk  +  7 '<*kPk)  <  f(xk)  +  Pl'akglpk  +  — 1 ^PfctfjfcPfcOtherwise.  (9.4b) 

The  next  iterate  xjt+i  is  defined  as 

Xk+1  =  Xk  +  7  '"oikPk-  (9.5) 

Since  pk  satisfies  Apk  =  0,  we  may  write  pk  =  Zvk  for  some  1 >k-  The  analogy 
with  the  unconstrained  case  is  clearer  if  this  representation  is  used. 

Theorem  9.1.  If  an  infinite  sequence  Is  generated  as  defined  in  (9.5),  any 

limit  point  x  satisfies  ZTVf(x)  =  0. 

Proof.  Consider  a  sequence  {£*}£L0-  Assume  that  there  for  some  positive  constant 
e  exists  an  infinite  subsequence  such  that  ||Z7'(fc||  >  e  for  all  k  €  J ■  The 

matrix  on  the  left-hand  side  of  (9.1)  has  its  norm  bounded  by  a  constant,  and 
consequently  there  exists  a  positive  constant  cj  such  that  ||ujt||  >  cj  for  all  k  G  J- 
It  follows  from  Lemma  9.1  and  the  convergence  analysis  of  Kaniel  and  Dax  [KD79] 
that  there  exists  a  finite  I  such  that  (9.2a)  holds  for  all  k  6  J  such  that  k  >  I.  Since 
(9.2a)  holds  for  all  &  €  J  such  that  k  >  /,  it  follows  from  the  convergence  analysis 
of  Kaniel  and  Dax  [KD79]  that  there  exists  a  finite  I  such  that  ||ZT<7itj|  <  e  for  all 
k  6  J  such  that  k  >  /.  Consequently,  limjt_oo  ZTgk  =  0,  as  required.  | 

However,  since  the  search  direction  is  zero  whenever  ZTg  =  0,  it  follows  that  no 
stronger  result  than  convergence  to  a  first-order  point  is  possible  with  this  method.  A 
slightly  modified  method  may  be  obtained  by  letting  6  be  variable,  i.e.,  define  a  value 
6k  at  the  fc-th  iteration.  If,  at  the  fc-th  iteration,  the  initial  LBLT- factorization  has 
more  than  m  negative  eigenvalues,  then  £*+1  >  6k,  in  order  to  reduce  the  amount 
of  negative  curvature  in  the  null  space  of  A.  Otherwise,  6k+i  <  6k .  If 
converges  to  a  solution  where  ZTH Z  is  positive  definite,  then  6k  may  be  reduced  so 
that  lim/t_00  6k  =  0,  ensuring  that  the  asymptotic  rate  of  convergence  is  identical 
to  that  of  Newton’s  method. 
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10.  Discussion 

The  pivot  strategy  discussed  in  Section  4  is  more  restrictive  than  a  regular  LBLT- 
factorization,  since  it  only  allows  pivots  of  type  H+,  A~  and  HA  until  all  rows  of 
A  have  been  processed.  Consequently,  if  an  initial  ordering  is  given  by  the  analyze 
phase,  this  pivot  strategy  is  likely  to  change  the  ordering  more  than  a  regular  LBLT- 
factorization.  To  attempt  to  maintain  sparsity  of  the  factors,  it  is  desirable  to 
reduce  the  number  of  additional  pivots  required  in  the  numerical  phase.  In  some 
circumstances  it  may  be  possible  to  accept  pivots  of  other  type  than  H* ,  A~  and 
HA.  It  was  shown  in  Section  3.4  that  modifying  a  diagonal  element  of  B  of  type 
H~  or  HH  may  alter  the  A-part  and  the  zero-part  of  K.  However,  the  only  altered 
elements  correspond  to  nonzeros  in  the  outer  product  created  by  the  corresponding 
column  or  columns  of  L.  It  is  a  simple  matter  to  check  if  these  nonzeros  of  L 
correspond  to  rows  of  A.  If  they  do  not,  the  particular  H~  or  HH  pivot  can  be 
accepted  even  if  the  number  of  negative  eigenvalues  of  K\\  exceeds  m\  by  doing  so. 
When  K  is  sparse,  it  may  be  a  common  event  that  the  nonzeros  do  not  correspond 
to  rows  of  A  once  a  significant  portion  of  the  rows  of  A  have  been  processed.  If 
the  nonzeros  do  not  correspond  to  rows  of  A,  the  reduced  Hessian  has  at  least  one 
negative  eigenvalue,  and  we  can  still  obtain  a  feasible  direction  of  negative  curvature. 

Another  scheme  is  to  accept  any  pivot  a  regular  LBLT- factorization  would  ac¬ 
cept,  keeping  track  of  restart  points  where  K\\  has  inertia  (ni,mx,0)  and  is  suffi¬ 
ciently  nonsingular.  If  it  turns  out  that  K  has  inertia  (n,  m,  0),  the  reduced  Hessian 
is  positive  definite,  and  the  Newton  direction  is  a  descent  direction.  Otherwise, 
when  forming  the  factorization,  let  K\\  denote  the  part  of  I(  that  is  factorized 
when  it  is  discovered  from  the  Schur-complement  that  the  inertia  of  K  is  different 
from  (n,m,0).  If  Ku  contains  all  rows  of  A,  has  inertia  (ni,m,0)  and  is  sufficiently 
nonsingular,  the  results  of  Section  4  apply.  If  not  all  rows  of  A  have  yet  been  pro¬ 
cessed,  an  attempt  may  be  made  to  find  pivots  of  type  H+  and  A+,  in  order  to 
form  a  K n  that  is  sufficiently  nonsingular,  contains  all  rows  of  A  and  has  m  nega¬ 
tive  eigenvalues.  If  this  attempt  is  successful,  Theorems  4.1  and  4.2  apply,  and  the 
desired  directions  may  be  computed.  If  the  attempt  is  not  successful,  part  of  K n, 
from  the  latest  restart  point,  may  be  refactorized,  imposing  the  pivoting  strategy  of 
Section  4. 

If  the  number  of  positive  eigenvalues  in  the  reduced  Hessian  is  large  compared 
to  the  number  of  negative  eigenvalues,  we  expect  the  number  of  pivots  of  type 
H~  and  HH  to  be  low.  Consequently,  if  the  rows  of  A  are  processed  early  in  the 
factorization,  there  is  a  high  likelihood  that  these  pivots  will  occur  only  after  all 
rows  of  A  have  been  processed.  A  new  version  of  MA27  (MA47)  allows  2x2  pivots 
in  the  analyze  phase,  see  Duff  et  al.  [DGR*89].  Moreover,  these  pivots,  which 
in  our  case  would  be  HA  pivots,  are  preferred  over  lxl  pivots.  We  expect  this 
scheme  to  make  the  difference  between  the  additional  permutation  requirements 
of  the  scheme  of  Section  4  and  the  additional  permutations  required  by  a  regular 
L  B  LT- factorization  smaller,  since  in  many  instances  the  conditions  of  Section  4  will 
be  fulfilled  automatically. 

It  may  also  be  observed,  that  the  ability  to  compute  a  direction  of  negative 
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curvature  is  only  required  if  the  reduced  gradient  is  small  in  norm.  The  methods 
described  in  Sections  8  and  9  may  be  applied  whenever  the  norm  of  the  reduced 
gradient  is  sufficiently  positive.  Only  at  points  where  the  reduced  gradient  has 
small  norm  and  the  reduced  Hessian  is  not  positive  definite  is  it  necessary  to  apply 
the  strategy  of  Section  4. 

As  was  discussed  in  Section  7,  to  add  a  suitable  artificial  constraint  is  equivalent 
to  generating  a  feasible  direction  of  negative  curvature.  It  can  be  shown  that  an 
artificial  constraint  that  is  linearly  independent  of  the  constraints  in  A  may  be 
found  by  one  solve  with  K  utilizing  a  suitable  right-hand  side.  If  this  artificial 
constraint  increases  the  number  of  positive  eigenvalues  of  K  by  one,  a  direction  of 
negative  curvature  may  be  computed  as  described  in  Lemma  7.1.  Although  this  is 
not  guaranteed  to  be  the  case,  in  practice,  it  may  be  a  viable  strategy. 
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